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(54) Search system and method for providing a f ulltext search over web pages of world wide web 
servers 

(57) The present invention provides a search sys- 
tem (10) for providing fulltext search over web pages of 
world wide web servers which can save memory by 
storing only text, path and hyperlink data of a web page 
and excluding extraneous data. The system comprises 
a server (20) connected to an internet (14). a plurality of 
data groups (22) with w^ page data, and a manage- 
ment program (24). One user (16) can input search 
parameter such as keywords into the search system 
(10) over which the management program (24) uses the 
search parameters to find matching web pages using an 
index file (29) within the data groups (22). generates 
path data for the matched web pages and outputs the 
path and text data in a standard http format. The search 
system (10) retrieve only text and path data of each web 
page and leaves out extraneous data so that the mem- 
ory space of the server (20) can be saved. 
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Description 

[0001] The present invention relates to a search sys- 
tem and method for providing a fulltext search over web 
pages of world wide web servers. s 
[0002] The internet has become extremely popular 
with more and more web servers connecting to it. This 
enables users to connect with the internet to search a 
wealth of information. Unfortunately, the vast number of 
servers currently connected with the internet as well as 
the number of web pages stored in each server has 
become unmanageably large thus confusing the user. 
To overcome this problem, many web page search sys- 
tems have been produced. Users may key in desired 
information into the search systems to search servers 
and web pages. 

[0003] To create a database for web pages stored in 
world wide web servers, the search systems analyze 
and process data contained in collected web pages 
from the web servers for use in the search. A single web 
page may contain many types of files including graphs, 
text, sound, motion files, etc. Additionally, each web 
server may contain hundreds, thousands, even tens of 
thousands, of web pages. Creating a database for even 
a single web server would be an overwhelming task and 
the problem is compounded when one considers the 
fact that a search system must handle hundreds of web 
servers simultaneously. Clearly, this enormous amount 
of needed computer memory and increased processing 
time is unacceptable. 

[0004] With this problem in mind, the present invention 
aims at providing a search system that creates its data- 
base by storing the text, path and hyperlink data of a 
web page only and excluding all extraneous data to 
solve the above problems. 

[0005] This is achieved by the present invention as 
claimed in claim 1 in that the search system comprises 
an internet server connected to the Internet, a plurality 
of data groups stored in the server with each of the data 
groups comprising data from web pages of one world 
wide web server connected to the internet, and a man- 
agement program stored in the server for managing 
operations of the server and providing users with the 
fulltext search service over the data groups. Each of the 
data groups in the server comprises a path file for 
recording path data of each of the web pages in the 
world wide web server corresponding to the data group 
and an index file for providing fulltext search for text data 
contained in the web pages of the world wide web 
sender corresponding to the data group. The manage- 
ment program uses the index file of each data group to 
find web pages of the corresponding world wide web 
server which fit the specified search parameter, uses 
the path file of each data group to find the path data of 
each of the web pages of the corresponding world wide 
web -server which -fit the specified search parameter, 
and then outputs the result in a predeternnined format. 
[0006] The invention is illustrated by way of example 



with reference to the accompanying drawings, in which 

Rg.1 is a schematic diagram of a search system for 
fulltext search web pages of world wide web serv- 
ers according to the present invention, 
Rg.2 is a functional block diagram of the search 
system shown in Rg.1, 

ng.3 shows a flowchart for creating a database for 
a web server by the search system shown in Fig.1 , 
and 

Rg.4 is a flowchart for fulltext search executed by 
the search system shown in Rg.1. 

[0007] Please refer to Fig. 1 . Fig.1 is a schematic dia- 
gram of the search system 10 for fulltext search of web 
pages on world wide web senders according to the 
present invention. Through the internet 14, the search 
system 10 can connect to the world wide web server 12 
and a user 16. The web server 12 usually comprises a 
home page and a plurality of web pages for the user to 
search. To create a database, the search system 10 
retrieves web page data of the web server 12 and stores 
only text and path data. This method saves time and 
memory. 

[0008] Please refer to Fig. 2. Fig. 2 is a functional block 
diagram of the search system 10 shown in Fig.1. The 
search system 10 comprises a server 20 connected to 
the internet 1 4. a plurality of data groups 22, and a man- 
agement program 24 stored in the server 20. The server 
20 comprises a memory 21 for storing programs and 
data, and a CPU 23 for executing the program stored in 
the memory 21. The management program 24 man- 
ages the operation of the server 20 and comprises a 
data group creating module 25 for creating the data 
groups 22 within the world wide web server 12, and a 
fulltext search module 27 used by the data groups 22 to 
perform the fulltext search. Each of the data groups 22 
contains data of web pages in a single world wide web 
server 12, and comprises a text file 26 for recording the 
text data within the web pages stored in the web server 
12, a path file 28 for recording the path of the web 
pages, and an index file 29 for fulltext search of the text 
data of the web pages. 

[0009] The data group creating module 25 creates the 
data groups 22 of each web server 12 connected to the 
internet. The data groups 22 provide fulltext search 
capability to the user 16. Data groups 22 are made by 
the data group creating module 25 which first connects 
to the web server 12 through the internet 14, then uses 
the text data and path data within each web page to cre- 
ate the text file 26. path file 28, and index file 29. These 
constitute the data groups 22 of the web server 12. 
[001 0] The fulltext search module 27 is used for full- 
text search of the data groups 22. To search the web 
pages of the web server 12, the user inputs a keyword 
or a combination of keywords. Based on this informa- 
tion, the fulltext search module 27 uses the Index file 29 
to search the text file 26 in each of the data groups 22 



15 



20 



25 



30 



35 



40 



45 



50 



2 



3 



EP0 981 097 A1 



4 



for appropriate web pages. Finally, the fulltext search 
module 27 outputs the text data and path data of the 
appropriate web pages from the text file 26 and the path 
file 28 In a standard http web page format. The path file 
28 contains the address of the web server 12 and the 
paths of the web pages whose text data is in the corre- 
sponding text file 26. 

[0011] Please refer to Fig.3. Fig.3 shows a flowchart 
for creating the database for the web server 12 by the 
data group creating module 25 of the search system 10 
shown in Fig.1. The flowchart comprises following 
steps: 

step30: connecting to the world wide web server 1 2 

through the internet 14; 
stepSI : creating the text file 26, the path file 28 for 

the web server 12 and a hyperlink data file, 

then storing the address of the web server 

12 into the path file 28; 
step32: requesting the home page of the web server 

12; 

step33: storing the text data of the home page into 
the text file 26. storing the path data into the 
path file 28. storing the hyperlinks into the 
hyperlink file, creating the index file 29 
based on the text data stored in the text file 
26, and abandoning all extraneous data in 
the home page; 

step34: using a web page hyperlink from a previ- 
ously unaccessed hyperlink file to request 
data from a web page in the web server 1 2; 

step35: storing the text data of the web page into the 
text file 26, storing the path data into the 
path file 28. verifying the presence of the 
hyperlinks not yet stored in the web page 
and storing them into the hyperlink file, cre- 
ating the index file 29 based on the text data 
stored in the text file 26, and then abandon- 
ing extraneous data within the web page; 

step36: checking if all web pages stored in the 
hyperlinkf ile are accessed; if not, go back to 
step 34; 

step37: end. 

[001 2] Using the above procedure, the data group cre- 
ating module 25 sequentially accesses all web pages in 
the web server 1 2 or all or a set number of web pages in 
a predetermined tree structure, stores text and path 
data of each web page Into the text and path files 26 
and 28, respectively, and ignores all extraneous data. 
This method allows the search system 1 0 to create data 
groups 22 efficiently while saving memory space. 
[001 3] Please refer to Fig.4. Fig.4 is a flowchart show- 
ing the fulltext search process by the fulltext search 
module 27 within the search system 10. The procedure 
comprises the following steps: 

step40: connecting to the search system 10 through 



the internet 14; 

inputting a keyword into the search system 
10; 

searching the index file 29 of each data 
group 22 for the corresponding index data 
based on the keyword; 
searching the text file 26 and path file 28 of 
each data group 22 for corresponding text 
and path data based on index data con-e- 
sponding to the keyword; 
combining the text and path data, then out- 
putting the data. 

[0014] In step 44. the fulltext search module 27. rather 
than outputting the full text data, outputs the title or a 
portion of the text data of each web page according to 
the input command from the user. This output data is 
an-anged in a sequence and format in accordance with 
the http standard. Since the path data of the searched 
web pages are stored in each outputted web page in the 
form of hyperlinks, the user 16 may use hyperlinks to 
locate the original web server containing the desired 
web pages. 

[0015] When prior art search systems create data- 
bases for world wide web servers, the entire web page 
is often loaded before analyzing and organizing data 
within the web pages and producing the index data. 
This process requires a lot of computer memory and 
processing time. Conversely, the fulltext search system 
10 of the present invention saves memory and process- 
ing time by storing the text and path data in the web 
pages of the web server 12 and abandoning extraneous 
data. 



1 . A search system (1 0) for providing fulltext search of 
web pages of worki wide web servers connected to 
an internet (14) connprising: 

40 

an internet server (20) connected to the inter- 
net (14); 

a plurality of data groups (22) stored in the 
server (20), each of the data groups (22) com- 
45 prising data from web pages of one world wide 

web server (12) connected to the internet (14); 
and 

a management program (24) stored in the 
server (20) for managing operations of the 
50 server (20) and provkJing users with the fulltext 

search service over the data groups (22); 

characterized in that: 

55 each of the data groups (22) in the server (20) 

comprises: 

a path fOe (28) for recording path data of 
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each of the web pages in the world wide 
web server (12) corresponding to the data 
group (22); and 

an index file (29) for providing fulltext 
search for text data contained in the web 5 
pages of the world wide web server (12) 
corresponding to the data group (22); and 

according to at least one user specified search 
parameter, the management program (24) io 
uses the index file (29) of each data group (22) 
to find web pages of the corresponding world 
wide web server (12) which fit the specified 
search parameter, uses the path file (28) of 
each data group (22) to find the path data of is 
each of the web pages of the corresponding 
world wide web server (12) which fit the speci- 
fied search parameter, and then outputs the 
result in a predetermined format. 

20 

2. The search system (1 0) of daim 1 wherein each of 
the data groups (22) stored in the server (20) fur- 
ther comprises a text file (26) for recording the text 
data contained in each of the web pages of the cor- 
responding world wide web server (12), the path file 25 
(23) of each data group (22) is used for recording 
the path data of each of the web pages contained in 
the text file (26) of the same data group (22), and 
the index file (29) of each data group (22) is used 

for providing fulltext search for the text data con- 30 
tained in the text file (26) of the same data group 
(22); and wherein after the specified search param- 
eter is provided, the management program (24) 
uses the index file (29) of each data group (22) to 
search the text file (26) of the same data group (22) 35 
for web pages which fit the search parameter, uses 
the text file (26) of the same data group (22) to 
retrieve text data of each web page which fit the 
search parameter, uses the path file (28) of the 
same data group (22) to find out the path data of 40 
each of the web pages which fit the specified 
search parameter, and then outputs the result in a 
predetermined format. 

3. The search system (10) of claim 2 wherein the 45 
management program (24) outputs the text data 
and path data of the web pages which fit the speci- 
fied search parameter in accordance with the http 
standard web page format. 

50 

4. The search system (10) of claim 2 wherein the 
management program (24) outputs a title portion or 
part of the text data contained in the web pages 
which fit the specified search parameter. 

55 

5. The search system (10) of claim 2 wherein the 
search parameter is a keyword or a combination of 
keywords. 



6. The search system (10) of claim 2 wherein the path 
file (28) of each data group (22) comprises internal 
paths of all the web pages of the corresponding 
world wide web server (12) and the internet 
address of the world wide web server (12) on the 
internet (14), and wherein the internal paths and 
the internet address are included in the path data 
outputted by the management program (24). 

7. The search system (10) of daim 2 wherein the 
management program (24) further conprises a 
data group creating module (25) for creating the 
data group (22) of each of the world wide web serv- 
ers for fulltext search, and wherein when creating 
one data group (22) for a world wide web server 
(12), the data group creating module (25) connects 
the world wide web server (1 2) through the internet 
(1 4) first, retrieves text and path data stored in the 
web pages of the world wide web server (12) , cre- 
ates one text file (26) and one path file (28) using 
the retrieved data, and then creates one index file 
(29) using the text file (26) for fulltext search of the 
text data contained in the text file (26). 

8. The search system (10) of claim 7 wherein after 
retrieving the text data and path data contained in 
each of the web pages, the management program 
(24) abandons all the other data to save memory 
space. 

9. A method for creating a data group (22) for a world 
wide web server (12) connected to an internet (14) 
in a fulltext search system (10), the search system 
(10) comprising: 

an internet server (20) connected to the inter- 
net (14) for storing the data group (22) of the 
world wide web server (1 2) ; and 
a management program (24) stored in the 
server (20) for managing operations of the 
server (20) and creating the data group (22) of 
the world wide web server (12); 
the data group (22) of the world wide web 
server (12) comprising: 

a path file (28) for recording path data of 
each of the web pages in the world wide 
web server (12); and 

an index file (29) for providing fulltext 
search for text data contained in the web 
pages in the worW wide web server (12); 

the method of creating the data group (22) 
comprising: 

connecting the server (20) with the world 
wide web server (12) through the internet 
(14): 
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retrieving path data from each of the web 
pages of the world wide web server (12) to 
create the path file (28); 
using text data contained in each of the 
web pages of the world wide web server s 
(12) to create the index file (29) for provid- 
ing fulltext search over the text data of the 
web pages in the world wide web server 
(12). 

10 

10. The method of claim 9 wherein the data groLqD (22) 
of the world wide web server (12) further comprises 
a text file (26) for recording the text data contained 
in each of the web pages of the world wide web 
server (1 2), the path file (28) of the data group (22) is 
is used for recording the path data of each of the 
web pages contained in the text file (26) of the data 
group (22), and the index file (29) of the data group 
(22) is used for providing fulltext search for the text 
data contained in the text file (26) of the data group 20 
(22) ; and wherein the method further comprises 

the following step: 

retrieving text data from each of the web pages 
of the world wide web server (1 2) to aeate the 25 
text file (26). 

11. Tlie method of claim 10 wherein after retrieving the 
text data and path data contained in each of the 
web pages, the management program (24) aban- 30 
dons all the other data to save memory space. 

1 2. The method of claim 1 0 wherein when retrieving the 
text data and path data contained in each of the 
web pages, the management program (24) can 35 
retrieve the data from all the web pages, a predeter- 
mined number of web pages, or all of the web 
pages in a predetermined tree structure from the 
world wide web server (12) . 
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